AITopics | arc task

Collaborating Authors

arc task

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

ANPL: Towards Natural Programming with Interactive Decomposition Di Huang

Neural Information Processing SystemsFeb-17-2026, 11:23:28 GMT

Though LLMs are capable of generating plausible programs, it's challenging to interact with the LLMs further to revise the program, especially if the user's specific requirements are different from the initial proposal.

large language model, machine learning, programming language, (19 more...)

Neural Information Processing Systems

Country:

North America > Mexico (0.04)
Asia > China (0.04)
North America > United States > District of Columbia > Washington (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry: Education (0.93)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Communicating Natural Programs to Humans and Machines

Neural Information Processing SystemsDec-23-2025, 20:09:04 GMT

The Abstraction and Reasoning Corpus (ARC) is a set of procedural tasks that tests an agent's ability to flexibly solve novel problems. While most ARC tasks are easy for humans, they are challenging for state-of-the-art AI. What makes building intelligent systems that can generalize to novel situations such as ARC difficult? We posit that the answer might be found by studying the difference of $\textit{language}$: While humans readily generate and interpret instructions in a general language, computer systems are shackled to a narrow domain-specific language that they can precisely execute. We present LARC, the $\textit{Language-complete ARC}$: a collection of natural language descriptions by a group of human participants who instruct each other on how to solve ARC tasks using language alone, which contains successful instructions for 88\% of the ARC tasks. We analyze the collected instructions as `natural programs', finding that while they resemble computer programs, they are distinct in two ways: First, they contain a wide range of primitives; Second, they frequently leverage communicative strategies beyond directly executable codes. We demonstrate that these two distinctions prevent current program synthesis techniques from leveraging LARC to its full potential, and give concrete suggestions on how to build the next-generation program synthesizers.

communicating natural program, human and machine, name change, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence (1.00)

Add feedback

ARCTraj: A Dataset and Benchmark of Human Reasoning Trajectories for Abstract Problem Solving

Kim, Sejin, Choi, Hayan, Lee, Seokki, Kim, Sundong

arXiv.org Artificial IntelligenceNov-18-2025

We present ARCTraj, a dataset and methodological framework for modeling human reasoning through complex visual tasks in the Abstraction and Reasoning Corpus (ARC). While ARC has inspired extensive research on abstract reasoning, most existing approaches rely on static input--output supervision, which limits insight into how reasoning unfolds over time. ARCTraj addresses this gap by recording temporally ordered, object-level actions that capture how humans iteratively transform inputs into outputs, revealing intermediate reasoning steps that conventional datasets overlook. Collected via the O2ARC web interface, it contains around 10,000 trajectories annotated with task identifiers, timestamps, and success labels across 400 training tasks from the ARC-AGI-1 benchmark. It further defines a unified reasoning pipeline encompassing data collection, action abstraction, Markov decision process (MDP) formulation, and downstream learning, enabling integration with reinforcement learning, generative modeling, and sequence modeling methods such as PPO, World Models, GFlowNets, Diffusion agents, and Decision Transformers. Analyses of spatial selection, color attribution, and strategic convergence highlight the structure and diversity of human reasoning. Together, these contributions position ARCTraj as a structured and interpretable foundation for studying human-like reasoning, advancing explainability, alignment, and generalizable intelligence.

machine learning, reinforcement learning, trajectory, (15 more...)

arXiv.org Artificial Intelligence

2511.11079

Genre: Research Report > New Finding (0.67)

Industry: Education > Educational Setting (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)

Add feedback

Don't throw the baby out with the bathwater: How and why deep learning for ARC

Cole, Jack, Osman, Mohamed

arXiv.org Artificial IntelligenceNov-3-2025

The Abstraction and Reasoning Corpus (ARC-AGI) presents a formidable challenge for AI systems. Despite the typically low performance on ARC, the deep learning paradigm remains the most effective known strategy for generating skillful (state-of-the-art) neural networks (NN) across varied modalities and tasks in vision, language etc. The deep learning paradigm has proven to be able to train these skillful neural networks and learn the abstractions needed in these diverse domains. Our work doubles down on that and continues to leverage this paradigm by incorporating on-the-fly NN training at test time. We demonstrate that fully committing to deep learning's capacity to acquire novel abstractions yields state-of-the-art performance on ARC. Specifically, we treat both the neural network and the optimizer (rather than just a pre-trained network) as integral components of the inference process, fostering generalization to unseen tasks. Concretely, we propose a methodology for training on ARC, starting from pretrained LLMs, and enhancing their ARC reasoning. We also propose Test-Time Fine-Tuning (TTFT) and the Augment Inference Reverse-Augmentation and Vote (AIRV) as effective test-time techniques. We are the first to propose and show deep learning can be used effectively for ARC, showing boosts of up to 260% in accuracy with AIRV and a further 300% boost with TTFT. An early version of this approach secured first place in the 2023 ARCathon competition, while the final version achieved the current best score on the ARC private test-set (58%). Our findings highlight the key ingredients of a robust reasoning system in unfamiliar domains, underscoring the central mechanisms that improve broad perceptual reasoning.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2506.14276

Country: North America (0.28)

Genre: Research Report > New Finding (0.48)

Industry: Education (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

ANPL: Towards Natural Programming with Interactive Decomposition Di Huang

Neural Information Processing SystemsOct-9-2025, 09:09:21 GMT

large language model, machine learning, programming language, (19 more...)

Neural Information Processing Systems

Country:

North America > Mexico (0.04)
Asia > China (0.04)
North America > United States > District of Columbia > Washington (0.04)
(5 more...)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.67)

Industry: Education (0.93)

Technology:

Information Technology > Software > Programming Languages (1.00)
Information Technology > Human Computer Interaction (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
(3 more...)

Add feedback

Impact of Noise on LLM-Models Performance in Abstraction and Reasoning Corpus (ARC) Tasks with Model Temperature Considerations

Khandalkar, Nikhil, Yadav, Pavan, Shinde, Krishna, Ramegowda, Lokesh B., Das, Rajarshi

arXiv.org Artificial IntelligenceApr-25-2025

Recent advancements in Large Language Models (LLMs) have generated growing interest in their structured reasoning capabilities, particularly in tasks involving abstraction and pattern recognition. The Abstraction and Reasoning Corpus (ARC) benchmark plays a crucial role in evaluating these capabilities by testing how well AI models generalize to novel problems. While GPT-4o demonstrates strong performance by solving all ARC tasks under zero-noise conditions, other models like DeepSeek R1 and LLaMA 3.2 fail to solve any, suggesting limitations in their ability to reason beyond simple pattern matching. To explore this gap, we systematically evaluate these models across different noise levels and temperature settings. Our results reveal that the introduction of noise consistently impairs model performance, regardless of architecture. This decline highlights a shared vulnerability: current LLMs, despite showing signs of abstract reasoning, remain highly sensitive to input perturbations. Such fragility raises concerns about their real-world applicability, where noise and uncertainty are common. By comparing how different model architectures respond to these challenges, we offer insights into the structural weaknesses of modern LLMs in reasoning tasks. This work underscores the need for developing more robust and adaptable AI systems capable of handling the ambiguity and variability inherent in real-world scenarios. Our findings aim to guide future research toward enhancing model generalization, robustness, and alignment with human-like cognitive flexibility.

large language model, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2504.15903

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Understanding LLMs' Fluid Intelligence Deficiency: An Analysis of the ARC Task

Wu, Junjie, Yu, Mo, Liu, Lemao, Yeung, Dit-Yan, Zhou, Jie

arXiv.org Artificial IntelligenceFeb-10-2025

While LLMs have exhibited strong performance on various NLP tasks, it is noteworthy that most of these tasks rely on utilizing the vast amount of knowledge encoded in LLMs' parameters, rather than solving new problems without prior knowledge. In cognitive research, the latter ability is referred to as fluid intelligence, which is considered to be critical for assessing human intelligence. Recent research on fluid intelligence assessments has highlighted significant deficiencies in LLMs' abilities. In this paper, we analyze the challenges LLMs face in demonstrating fluid intelligence through controlled experiments, using the most representative ARC task as an example. Our study revealed three major limitations in existing LLMs: limited ability for skill composition, unfamiliarity with abstract input formats, and the intrinsic deficiency of left-to-right decoding. Our data and code can be found in https://wujunjie1998.github.io/araoc-benchmark.github.io/.

large language model, machine learning, natural language, (18 more...)

arXiv.org Artificial Intelligence

2502.0719

Country:

Europe > Austria > Vienna (0.14)
North America > United States > Connecticut > Fairfield County > Westport (0.04)
North America > United States > California > San Diego County > San Diego (0.04)
(2 more...)

Genre:

Research Report > New Finding (0.88)
Research Report > Experimental Study (0.68)

Industry: Government (0.34)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science (1.00)

Add feedback

The role of positional encodings in the ARC benchmark

Costa, Guilherme H. Bandeira, Freire, Miguel, Oliveira, Arlindo L.

arXiv.org Artificial IntelligenceJan-31-2025

The Abstraction and Reasoning Corpus challenges AI systems to perform abstract reasoning with minimal training data, a task intuitive for humans but demanding for machine learning models. Using CodeT5+ as a case study, we demonstrate how limitations in positional encoding hinder reasoning and impact performance. This work further examines the role of positional encoding across transformer architectures, highlighting its critical influence on models of varying sizes and configurations. Comparing several strategies, we find that while 2D positional encoding and Rotary Position Embedding offer competitive performance, 2D encoding excels in data-constrained scenarios, emphasizing its effectiveness for ARC tasks

large language model, machine learning, natural language, (19 more...)

arXiv.org Artificial Intelligence

2502.00174

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > United States > California > Los Angeles County > Long Beach (0.04)
Asia > Singapore (0.04)
Asia > Indonesia > Bali (0.04)

Genre: Research Report > New Finding (0.69)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.91)

Add feedback

NSA: Neuro-symbolic ARC Challenge

Batorski, Paweł, Brinkmann, Jannik, Swoboda, Paul

arXiv.org Artificial IntelligenceJan-8-2025

The Abstraction and Reasoning Corpus (ARC) challenge [7] is a difficult few-shot benchmark for testing visual reasoning capabilities of machine learning models. The capabilities The Abstraction and Reasoning Corpus (ARC) evaluates of recent general-purpose LLM systems are, as of general reasoning capabilities that are difficult for both now, not good enough to solve ARC at human performance machine learning models and combinatorial search methods. in a reasonably limited amount of time [19, 20, 28]. Arguably We propose a neuro-symbolic approach that combines their pre-training seems to have not imbued them a transformer for proposal generation with combinatorial with enough of the necessary concepts required to solve search using a domain-specific language. The transformer ARC tasks reliably and without an excessive number of narrows the search space by proposing promising search directions, tries. It is unclear whether LLMs lack the correct level of which allows the combinatorial search to find the abstraction and the specific type of high-level visual reasoning actual solution in short time.

large language model, machine learning, transformation primitive, (16 more...)

arXiv.org Artificial Intelligence

2501.04424

Genre: Research Report (0.64)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Search (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.89)

Add feedback

Filters

Collaborating Authors

arc task

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

182aed0379591ebd1d655b2bdc152075-Paper-Datasets_and_Benchmarks.pdf

ANPL: Towards Natural Programming with Interactive Decomposition Di Huang

Communicating Natural Programs to Humans and Machines

ARCTraj: A Dataset and Benchmark of Human Reasoning Trajectories for Abstract Problem Solving

Don't throw the baby out with the bathwater: How and why deep learning for ARC

ANPL: Towards Natural Programming with Interactive Decomposition Di Huang

Impact of Noise on LLM-Models Performance in Abstraction and Reasoning Corpus (ARC) Tasks with Model Temperature Considerations

Understanding LLMs' Fluid Intelligence Deficiency: An Analysis of the ARC Task

The role of positional encodings in the ARC benchmark

NSA: Neuro-symbolic ARC Challenge